Audiovisual speech recognition using multiscale nonlinear image decomposition

نویسندگان

  • Iain A. Matthews
  • J. Andrew Bangham
  • Stephen J. Cox
چکیده

There has recently been increasing interest in the idea of enhancing speech recognition by the use of visual information derived from the face of the talker. This paper demonstrates the use of nonlinear image decomposition, in the form of a ‘sieve’, applied to the task of visual speech recognition. Information derived from the mouth region is used in visual and audiovisual speech recognition of a database of the letters A-Z for four talkers. A scale histogram is generated directly from the grayscale pixels of a window containing the talkers mouth on a per frame basis. Results are presented for visual-only, audio-only and in a simple audiovisual case.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scale Based Features for Audiovisual Speech Recognition

This paper demonstrates the use of nonlinear image decomposition, in the form of a sieve, applied to the task of audiovisual speech recognition of a database of the letters A–Z for ten talkers. A scale based feature vector is formed directly from the grayscale pixels of an image containing the talkers mouth on a per frame basis. This is independent of image amplitude and position information an...

متن کامل

A Multiscale Image Representation Using Hierarchical (BV, L2 ) Decompositions

We propose a new multiscale image decomposition which offers a hierarchical, adaptive representation for the different features in general images. The starting point is a variational decomposition of an image, f = u0 + v0, where [u0, v0] is the minimizer of a J-functional, J(f, λ0;X,Y ) = infu+v=f { ‖u‖X + λ0‖v‖pY } . Such minimizers are standard tools for image manipulations (e.g., denoising, ...

متن کامل

Lipreading Using Shape, Shading and Scale

This paper compares three methods of lipreading for visual and audio-visual speech recognition. Lip shape information is obtained using an Active Shape Model (ASM) lip tracker but is not as effective as modelling the combined shape and enclosed greylevel surface using an Active Appearance Model (AAM). A nontracked alternative is a nonlinear transform of the image using a multiscale spatial anal...

متن کامل

Modelling asynchrony in speech using elementary single-signal decomposition

Although the possibility of asynchrony between different components of the speech spectrum has been acknowledged, its potential effect on automatic speech recogniser performance has only recently been studied. This paper presents the results of continuous speech recognition experiments in which such asynchrony is accommodated using a variant of HMM decomposition. The paper begins with an invest...

متن کامل

Classification of emotional speech using spectral pattern features

Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996